Changes from Cole Trapnell #110

brgew · 2023-10-26T22:02:46Z

There are essentially three groups of changes (in order of most important to least important for our lab):

The computation and storage of the variance-covariance matrix on model parameters (theta) when using the bootstrap.
An expanded interface for configuring the model initialization, to allow for customization of the "post-treatment" steps.
Changes to the torch optimizer to smooth the way for running on GPUs

…can be used e.g. in bootstrap/jackknife

…arnings about ineffiecient access

…ass for PLNnetworkfit

jchiquet

Some very minor comments which need quick explanation before approval.

jchiquet · 2023-10-27T07:42:43Z

R/PLN.R

@@ -105,7 +105,7 @@ PLN_param <- function(
    Omega         = NULL,
    config_post   = list(),
    config_optim  = list(),
-    inception     = NULL     # pretrained PLNfit used as initialization
+    inception     = NULL     # pretrained PLNfit used as initialization,


why the additional comma if its the last element of the list?

jchiquet · 2023-10-27T07:49:50Z

R/PLNnetworkfamily-class.R

+        # CHECK_ME_TORCH_GPU
+        # This appears to be in torch_gpu only. The commented out line below is
+        # in both PLNmodels/master and PLNmodels/dev.
+        myPLN <- switch(control$covariance,


Not sure why one would need to init with a diagonal or spherical PLN covariance model, when the goal is to fit a PLNnetwork model, that is, one with sparse precision/covariance matrix... But if used for completeness/compatibility, that is fine.

I have exactly the same question. Diagonal covariance matrix would result in an empty graph (with only isolated nodes).

@maddyduran and I seem to remember issues when initializing PLNnetwork when you have fewer samples than species - if you use "fixed" there's a call to solve() that can throw an exception. We figured that "spherical" or diagonal would avoid this (skipping that call) and move on to estimating a penalized inverse covariance matrix even with a spherical or diagonal init.

mahendra-mariadassou

Thank you very much for the nice PR and many changes, including the brand new jackniffe estimator for the B matrix.

Like @jchiquet I have a few comment before approving the PR. The main one concerns changes to PLNnetwork which I'm not sure are so useful: since a PLNnetworkfit is always an object of class PLNfit_fixedcov, there is limited use in passing a different variance structure in the control list (and it increases potential for making mistakes).

mahendra-mariadassou · 2023-10-30T11:34:49Z

R/utils.R

@@ -107,6 +108,11 @@ trace <- function(x) sum(diag(x))
  x
 }

+.logfactorial_torch <- function(n){
+  n[n == 0] <- 1 ## 0! = 1!
+  n*torch_log(n) - n + torch_log(8*torch_pow(n,3) + 4*torch_pow(n,2) + n + 1/30)/6 + log(pi)/2


For .logfactorial_torch(), shouldn't the final term log(pi)/2 be torch_log(pi)/2 ?

yes, though not actually sure what would happen under the hood here in terms of evaluation?

mahendra-mariadassou · 2023-10-30T11:36:53Z

R/PLNfit-class.R

+        .5 * torch_sum(torch_mm(params$M, params$Omega) * params$M + S2 * torch_diag(params$Omega), dim = 2)
+      Ji_tmp = Ji_tmp$cpu()
+      Ji_tmp = as.numeric(Ji_tmp)
+      Ji <- .5 * self$p - rowSums(.logfactorial(as.matrix(data$Y$cpu()))) + Ji_tmp


Why not use .logfactorial_torch() instead of .logfactorial() ? Because the computation has already moved back to the CPU at this point ?

Good question. Perhaps it would be better to defer that so we can use .logfactorial_torch() instead

mahendra-mariadassou · 2023-10-30T11:39:25Z

R/PLNfit-class.R

@@ -156,7 +159,7 @@ PLNfit <- R6Class(

        ## Check for convergence
        if (delta_f < config$ftol_rel) status <- 3
-        if (delta_x < config$xtol_rel) status <- 4
+        #if (delta_x < config$xtol_rel) status <- 4


Is there a reason to remove the convergence check on the parameter values and to keep only the one on the ELBO value ? This will speed up the algorithm (less conditions to satisfy) but may cause the nlopt and torch implementations diverge in the result they produce (not a bad thing per se, but something we need be aware of).

I defer to you on that - we have been generally happy with larger default values of ftol_rel, triggering convergence on status 3 most of the time.

mahendra-mariadassou · 2023-10-30T11:46:17Z

R/PLNfit-class.R

@@ -217,6 +220,54 @@ PLNfit <- R6Class(
      invisible(list(var_B = var_B, var_Omega = var_Omega))
    },

+    compute_vcov_from_resamples = function(resamples){
+      # compute the covariance of the parameters
+      get_cov_mat = function(data, cell_group) {


I may be mistaken, but get_cov_mat() appears to be defined but never used anywhere in compute_vcov_from_resamples(). Is it necessary ?

Vestigial code from debugging - can be removed

mahendra-mariadassou · 2023-10-30T11:49:42Z

R/PLNnetworkfamily-class.R

+        # CHECK_ME_TORCH_GPU
+        # This appears to be in torch_gpu only. The commented out line below is
+        # in both PLNmodels/master and PLNmodels/dev.
+        myPLN <- switch(control$covariance,


I have exactly the same question. Diagonal covariance matrix would result in an empty graph (with only isolated nodes).

mahendra-mariadassou · 2023-10-30T12:13:30Z

R/PLNfamily-class.R

-        nullModel <- nullModelPoisson(self$responses, self$covariates, self$offsets, self$weights)
+      #' @param config_post a list for controlling the post-treatment.
+      postTreatment = function(config_post, config_optim) {
+        #nullModel <- nullModelPoisson(self$responses, self$covariates, self$offsets, self$weights)


No comparison to a null Poisson model in the general post-treatment of PLN families. Is it to improve speed / because it's not required for the post-treatments ?

In our applications, the call to glm.fit inside of nullModelPoisson() can sometimes throw an exception, which ruins a perfectly good PLN fit! Since it seemed to us that nullModelPoisson() was something one only needed to do when (optionally) computing the approximate R2, we thought this call was superfluous.

mahendra-mariadassou · 2023-10-30T12:20:46Z

R/PLNfit-class.R

    variance_jackknife = function(Y, X, O, w, config = config_default_nlopt) {
-      jacks <- future.apply::future_lapply(seq_len(self$n), function(i) {
+      jacks <- lapply(seq_len(self$n), function(i) {


I have no strong opinion on this one as @jchiquet was the one who wrote this part but is there a reason prefer lapply to future_lapply (one less dependency ? simpler to use ?). A nice thing about future_lapply is that it is backend-agnostic and can be used for several parallalelization paradigms.

Ah, yes, agree it would be wonderful to keep this. The issue is that on machines that use OpenBLAS with a multithreaded backend, using future can deadlock the session. A workaround is to wrap calls to future with something like this:

old_omp_num_threads = as.numeric(Sys.getenv("OMP_NUM_THREADS")) if (is.na(old_omp_num_threads)){ old_omp_num_threads = 1 } RhpcBLASctl::omp_set_num_threads(1) old_blas_num_threads = as.numeric(Sys.getenv("OPENBLAS_NUM_THREADS")) if (is.na(old_omp_num_threads)){ old_blas_num_threads = 1 } RhpcBLASctl::blas_set_num_threads(1)

Then you do work with future and then:

RhpcBLASctl::omp_set_num_threads(old_omp_num_threads) RhpcBLASctl::blas_set_num_threads(old_blas_num_threads)

We didn't add this because we didn't want to add a new dependency on RhpcBLASctl to the package, but you could do if you want to be able to do linear algebra inside of functions called by future

mahendra-mariadassou · 2023-10-30T12:24:45Z

R/PLNfit-class.R

        data <- list(Y = Y[resample, , drop = FALSE],
                     X = X[resample, , drop = FALSE],
                     O = O[resample, , drop = FALSE],
                     w = w[resample])
+        #print (config$torch_device)
+        #print (config)
+        if (config$algorithm %in% c("RPROP", "RMSPROP", "ADAM", "ADAGRAD")) # hack, to know if we're doing torch or not


Should we add a backend element (set to the appropriate value) to config_default_nlopt and config_default_torch so that they are self-aware and that we can use
if (config$backend == "torch") {...}

That would be better. Would also be good to be able to specify the torch device (e.g. "mps", "cuda", etc)

mahendra-mariadassou · 2023-10-30T12:25:02Z

R/PLNfit-class.R

        args <- list(data = data,
                     params = list(B = private$B, M = matrix(0,self$n,self$p), S = private$S[resample, ]),
                     config = config)
+        if (config$algorithm %in% c("RPROP", "RMSPROP", "ADAM", "ADAGRAD")) # hack, to know if we're doing torch or not


Same as previous comment

mahendra-mariadassou · 2023-10-30T12:30:41Z

R/PLNnetwork.R

@@ -72,18 +73,25 @@ PLNnetwork <- function(formula, data, subset, weights, penalties = NULL, control
 #' @seealso [PLN_param()]
 #' @export
 PLNnetwork_param <- function(
-    backend           = "nlopt",
+    backend           = c("nlopt", "torch"),
+    covariance        = c("fixed", "spherical", "diagonal"),


I'm mixed about this: to avoid potential problems, it would be better to allow only covariance = "fixed" (especially since f801ca6) reverts back to PLNnetworkfit <-R6Class(inherits = PLNfit_fixedcov)

@maddyduran and I discussed this and we think the use case for covariance = "spherical" or "diagonal" in PLNnetwork is when you have fewer samples than you have species. IIRC, the issue was the call to solve() in "fixed" optimize() function.

ctrapnell · 2023-11-08T16:45:05Z

@mahendra-mariadassou thanks for the question. I replied to this question in the wrong place yesterday - but I think the answer is that the use case for covariance = "spherical" or "diagonal" in PLNnetwork is when you have fewer samples than you have species. IIRC, the issue was the call to solve() in "fixed" optimize() function. If n < p, that solve can throw an exception, never making it to the sparse estimation (which can deal with the n < p case at least insofar as returning an answer).

mahendra-mariadassou · 2023-11-13T16:07:39Z

Hi,

We made another pass with @jchiquet and made a few suggestions on a new tweaks branch. Since I'm not a github expert, I branched tweaks from PLN-team:master, pulled cole-trapnell-lab:master into and then pushed a few commits (so that it has our suggestions on top of your PR).

Here are the suggestions:

[44a1cbd] Slight changes to the PLNnetwork inception to make it obvious that the covariance structure is only enforced in the inception model and not in the fitted model
[48b5925] use config_post for all PLN*() functions
[6e93d99] and [772b3ad] Restore the convergence check on parameter values updates (as it can be deactivated by setting xtol_rel to infty) and the use of future_lapply (@jchiquet will open a specific issue for the problem you mentioned when using openBLAS but we'd like to fix it everywhere future_lapply is used).
[8ccf9af] Add a backend parameters to config_optim so that one can check the function is running with nlopt or torch

You can pull PLN-team:tweaks into your master and make additional changes if you want, this should automatically update this PR.

jchiquet · 2023-11-14T07:56:55Z

To complete @mahendra-mariadassou 's answer, regarding lapply vs future_lapply, I suggest, as mahendra said, that we open a specific issue since it concerns all paralelissable action. Like Cole, I had noted that using future with a multi-threaded BLAS/LAPACK library (such as OpenBLAS) could have detrimental consequences. I therefore suggest defining a lapply function which, depending on the architecture in place, directs towards a classic or multicore lapply. See if future is capable of this ('sequential' or 'multicore' plan).

This is now referenced as Issue #111

ctrapnell · 2023-11-27T19:24:45Z

Hi PLN-team. Thanks for creating this. We have pulled this branch into our fork, merged in the changes from the main fork's master, and run our tests. All looks great to us!

mahendra-mariadassou · 2023-11-28T19:29:01Z

Hi @ctrapnell. You probably received an automatic email from github, but since I didn't use a standard workflow and just in case. I merged all your changes (amended with our changes in PLNteam:tweaks) into master, which automatically closed the PR.

Thanks again for the PR, the various changes and more generally your interest in PLNmodels !

ctrapnell · 2023-11-28T19:31:28Z

Terrific, thank you! C

…

On Tue, Nov 28, 2023 at 11:29 AM Mahendra Mariadassou < ***@***.***> wrote: Hi @ctrapnell <https://urldefense.com/v3/__https://github.com/ctrapnell__;!!K-Hz7m0Vt54!njTuq46c0WKSBLZaowyaAoXcvyIUuedPJIFDfV5hC8BB9HpqqzwpbqtGDzu9BEu8-rOBTE25g8lXNWJb5Jgb8NE$>. You probably received an automatic email from github, but since I didn't use a standard workflow and just in case. I merged all your changes (amended with our changes in PLNteam:tweaks) into master, which automatically closed the PR. Thanks again for the PR, the various changes and more generally your interest in PLNmodels ! — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/PLN-team/PLNmodels/pull/110*issuecomment-1830537129__;Iw!!K-Hz7m0Vt54!njTuq46c0WKSBLZaowyaAoXcvyIUuedPJIFDfV5hC8BB9HpqqzwpbqtGDzu9BEu8-rOBTE25g8lXNWJbTyBEiPQ$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AACH6GPNXV6G3YH5NJKYPQDYGY3QRAVCNFSM6AAAAAA6R4PYSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGUZTOMJSHE__;!!K-Hz7m0Vt54!njTuq46c0WKSBLZaowyaAoXcvyIUuedPJIFDfV5hC8BB9HpqqzwpbqtGDzu9BEu8-rOBTE25g8lXNWJbkZp9R2g$> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jchiquet · 2023-11-29T05:59:25Z

Thank you all very much. I'm preparing a version for CRAN today, now that Mahendra has done most of the work.

jchiquet · 2023-11-29T09:01:27Z

@ctrapnell, @maddyduran and @brgew we'd like to add you to the list of package contributors, do you agree? If so, I'll need an e-mail address for each of you (I get [email protected], [email protected] and [email protected], is this correct?)

brgew · 2023-11-29T17:15:29Z

Hi Julien, I made no contribution to the pull request. I merely posted it. So I prefer to not be added to the list of contributors. But thank you anyway! Ever grateful, Brent

…

On Wed, Nov 29, 2023 at 1:01 AM Julien Chiquet ***@***.***> wrote: @ctrapnell <https://urldefense.com/v3/__https://github.com/ctrapnell__;!!K-Hz7m0Vt54!nv5Aci4H9eHz65kV-DCuxmUnXm-kZhJzK2Of2eJO96JsPXredA3R-6qUhtI9Q-KHdpQIb8YP2_9FSSVeQ-yI$>, @maddyduran <https://urldefense.com/v3/__https://github.com/maddyduran__;!!K-Hz7m0Vt54!nv5Aci4H9eHz65kV-DCuxmUnXm-kZhJzK2Of2eJO96JsPXredA3R-6qUhtI9Q-KHdpQIb8YP2_9FScT_QQyL$> and @brgew <https://urldefense.com/v3/__https://github.com/brgew__;!!K-Hz7m0Vt54!nv5Aci4H9eHz65kV-DCuxmUnXm-kZhJzK2Of2eJO96JsPXredA3R-6qUhtI9Q-KHdpQIb8YP2_9FSevjGdXk$> we'd like to add you to the list of package contributors, do you agree? If so, I'll need an e-mail address for each of you (I get ***@***.***, ***@***.*** and ***@***.***, is this correct?) — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/PLN-team/PLNmodels/pull/110*issuecomment-1831485538__;Iw!!K-Hz7m0Vt54!nv5Aci4H9eHz65kV-DCuxmUnXm-kZhJzK2Of2eJO96JsPXredA3R-6qUhtI9Q-KHdpQIb8YP2_9FScc8TJUR$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACXSPQ5H3UNG5NY6ZL3JQEDYG32XHAVCNFSM6AAAAAA6R4PYSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZRGQ4DKNJTHA__;!!K-Hz7m0Vt54!nv5Aci4H9eHz65kV-DCuxmUnXm-kZhJzK2Of2eJO96JsPXredA3R-6qUhtI9Q-KHdpQIb8YP2_9FScO6sAF4$> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jchiquet · 2023-12-02T06:58:14Z

Thank you @brgew for your answer! I assume then that @maddyduran and @ctrapnell, who actually contributed to the PR, would like to be added as contributors.
Best

ctrapnell · 2023-12-04T16:55:12Z

Hi Julien, Thank you for thinking of adding us - not sure that's needed given our modest contribution, but we are grateful for being asked. Either way is OK, depending on what you generally do for the project with PRs. Thanks again for such a great tool and being open to contributions. Best, Cole

…

On Fri, Dec 1, 2023 at 10:58 PM Julien Chiquet ***@***.***> wrote: Thank you @brgew <https://urldefense.com/v3/__https://github.com/brgew__;!!K-Hz7m0Vt54!nwCz7GUTnHBAEmEVKBcK4AOBmleYuuGSw0kiuSYYbaHDyN7bX1_eBhSrznLLkAhRhJExh0o21DnhOxvI8sHJy-s$> for your answer! I assume then that @maddyduran <https://urldefense.com/v3/__https://github.com/maddyduran__;!!K-Hz7m0Vt54!nwCz7GUTnHBAEmEVKBcK4AOBmleYuuGSw0kiuSYYbaHDyN7bX1_eBhSrznLLkAhRhJExh0o21DnhOxvI9FQ_LG4$> and @ctrapnell <https://urldefense.com/v3/__https://github.com/ctrapnell__;!!K-Hz7m0Vt54!nwCz7GUTnHBAEmEVKBcK4AOBmleYuuGSw0kiuSYYbaHDyN7bX1_eBhSrznLLkAhRhJExh0o21DnhOxvI32LJMVc$>, who actually contributed to the PR, would like to be added as contributors. Best — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/PLN-team/PLNmodels/pull/110*issuecomment-1837064929__;Iw!!K-Hz7m0Vt54!nwCz7GUTnHBAEmEVKBcK4AOBmleYuuGSw0kiuSYYbaHDyN7bX1_eBhSrznLLkAhRhJExh0o21DnhOxvI_phU7gk$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AACH6GLTA44RTLUKJ4FZWWTYHLGRBAVCNFSM6AAAAAA6R4PYSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGA3DIOJSHE__;!!K-Hz7m0Vt54!nwCz7GUTnHBAEmEVKBcK4AOBmleYuuGSw0kiuSYYbaHDyN7bX1_eBhSrznLLkAhRhJExh0o21DnhOxvIzBypWGw$> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ctrapnell and others added 14 commits February 6, 2023 08:38

Pass optimization configuration options to postTreatment, so that it …

0e56cd7

…can be used e.g. in bootstrap/jackknife

Various improvements for torch optimizer needed to run it on the GPU

4dd7451

Compute vcov on parameters when using jackknife or bootstrap

635dd22

Remove stray print statement

0d5b9a7

pass covariance type to PLNnetwork

6a745c5

fix jackknife bug

057fd3b

cast Sigma to dense when fetching the max penalty in order to avoid w…

423775b

…arnings about ineffiecient access

Revert back to using fixed covariance instead of spherical as base cl…

f801ca6

…ass for PLNnetworkfit

Merge branch 'dev' into torch_gpu_dev

a0a43e4

changing line 85 to match PLNmodels/master

31b6153

actually putting line 85 back

ed7c181

Merge branch 'PLN-team:master' into torch_gpu_dev

926c77c

slight code cleanup in the torch optimizer

02e3501

Merge remote-tracking branch 'upstream/master'

227c6da

jchiquet requested review from mahendra-mariadassou and jchiquet October 27, 2023 06:34

jchiquet added enhancement feature bug labels Oct 27, 2023

jchiquet reviewed Oct 27, 2023

View reviewed changes

mahendra-mariadassou reviewed Oct 30, 2023

View reviewed changes

jchiquet mentioned this pull request Nov 14, 2023

Use parallel computing via future_lapply only when appropriate (detect multithreaded backend) #111

Open

mahendra-mariadassou merged commit 227c6da into PLN-team:master Nov 28, 2023

Changes from Cole Trapnell #110

Changes from Cole Trapnell #110

Conversation

brgew commented Oct 26, 2023

jchiquet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahendra-mariadassou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctrapnell commented Nov 8, 2023

mahendra-mariadassou commented Nov 13, 2023 • edited Loading

jchiquet commented Nov 14, 2023 • edited Loading

ctrapnell commented Nov 27, 2023

mahendra-mariadassou commented Nov 28, 2023

ctrapnell commented Nov 28, 2023 via email

jchiquet commented Nov 29, 2023

jchiquet commented Nov 29, 2023

brgew commented Nov 29, 2023 via email

jchiquet commented Dec 2, 2023

ctrapnell commented Dec 4, 2023 via email

mahendra-mariadassou commented Nov 13, 2023 •

edited

Loading

jchiquet commented Nov 14, 2023 •

edited

Loading